Noise-Robust Speech Recognition Technologies in Mobile Environments
نویسندگان
چکیده
In mobile environments, user input and operation via voice are simple and effective. However, the speech recognition performance is highly influenced by ambient noise in the mobile environments, which may cause a significant deterioration of the performance, and there is a strong demand for improvement of the performance. In order to resolve these issues, we conducted research in two directions: “advancement of multi-modal speech recognition” and “noise processing technologies.” 1) Research Related to Advancement of Multi-Modal Speech Recognition a) Multi-modal speech recognition using side-face moving image data We propose a multi-modal speech recognition method that uses side face and lip moving image data as part of a noise-robust speech recognition method in mobile environments. This method uses side face image data and allows the user to input voice in a natural posture. Figure 1 shows a configuration of the multimodal speech recognition proposed. In this method, acoustic and image data are merged using a multi-stream Hidden Markov Model (HMM), to improve the recognition performance. b) Examination of stream-weight optimization method in multi-modal speech recognition In the multi-modal speech recognition of acoustic and image information using the aforementioned multi-stream HMM, we propose to optimize a normalized likelihood criterion, and confirm that the error ratio can be reduced by approximately 40% when the volume of sample data is small compared to the conventional likelihood-ratio maximization method. 2) Research on Noise Processing Technologies for Speech Recognition The signal input to the recognition system is continuous without any decisive information about end-points. Techniques for automatically recognizing continuous speech signal are necessary. For this reason, we propose a methodology to automatically and robustly detect utterance intervals under conditions where the Signal to Noise Ratio (SNR) changes over time, based on a tree-structured noise overlay speech model, and confirm that the proposed method does improve the speech recognition performance.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملNew Concept Service for the Mobile Era Using Speech Technologies
In this paper, we describe new concept services based on speech processing technologies for the new digital/mobile era called a ubiquitous society. First, we propose a compact and noise robust embedded speech recognition middleware implemented on microprocessors aiming for sophisticated HMIs (Human Machine Interfaces) of car information systems. The compactness is essential for embedded systems...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملRobust speech interaction in a mobi use of multiple and differen
Mobile and outdoor environments have long been out of reach for speech engines due to the performance limitations that were associated with portable devices, and the difficulties of processing speech in high-noise areas. This paper outlines an architecture for attaining robust speech recognition rates in a mobile pedestrian indoor/outdoor navigation environment, through the use of a media fusio...
متن کامل